Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages
نویسندگان
چکیده
We demonstrate that large texts, representing human (English, Russian, Ukrainian) and artificial (C++, Java) languages, display quantitative patterns characterized by the Benford-like and Zipf laws. The frequency of a word following the Zipf law is inversely proportional to its rank, whereas the total numbers of a certain word appearing in the text generate the uneven Benford-like distribution of leading numbers. Excluding the most popular words essentially improves the correlation of actual textual data with the Zipfian distribution, whereas the Benford distribution of leading numbers (arising from the overall amount of a certain word) is insensitive to the same elimination procedure. The calculated values of the moduli of slopes of double logarithmical plots for artificial languages (C++, Java) are markedly larger than those for human ones.
منابع مشابه
Equilibrium (Zipf) and Dynamic (Grasseberg-Procaccia) method based analyses of human texts. A comparison of natural (english) and artificial (esperanto) languages
(Grasseberg-Procaccia) method based analyses of human texts. A comparison of natural (english) and artificial (esperanto) languages. Abstract A comparison of two english texts from Lewis Carroll, one (Alice in wonderland), also translated into esperanto, the other (Through a looking glass) are discussed in order to observe whether natural and artificial languages significantly differ from each ...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملGeneralized Thermodynamics Underlying the Laws of Zipf and Benford
We demonstrate that the laws of Zipf and Benford, that govern scores of data generated by many and diverse kinds of human activity (as well as other data from natural phenomena), are the centerpiece expressions of a generalized thermodynamic structure. This structure is obtained from a deformed type of statistical mechanics that arises when configurational phase space is incompletely visited in...
متن کاملStigler’s approach to recovering the distribution of first significant digits in natural data sets
Benford’s Law can be seen as one of the many first significant digit (FSD) distributions in a family of monotonically decreasing distributions. We examine the interrelationship between Benford and other monotonically decreasing distributions such as those arising from Stigler, Zipf, and the power laws. We examine the theoretical basis of the Stigler distribution and extend his reasoning by inco...
متن کاملLexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs
The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018